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Chapter  15 

A  Weak  Constraint  4D-Var  Assimilation 
System  for  the  Navy  Coastal  Ocean  Model 
Using  the  Representer  Method 


Hans  Ngodock  and  Matthew  Carrier 


Abstract  A  4D- Variational  system  was  recently  developed  for  assimilating  ocean 
observations  with  the  Navy  Coastal  Ocean  Model.  It  is  described  here,  along 
with  initial  assimilation  experiments  in  the  Monterey  Bay  using  a  combination  of 
real  and  synthetic  ocean  observations.  For  testing  a  new  assimilation  system  it  is 
advantageous  to  use  this  combination  of  real  and  synthetic  data  over  simplified 
cases  of  climatology  and  twin  data.  Assimilation  experiments  are  carried  out  in 
a  weak  constraint  formulation,  with  the  model’s  external  forcing  assumed  to  be 
erroneous  in  addition  to  initial  conditions.  The  system’s  ability  to  fit  assimilated  and 
non  assimilated  observations  is  assessed,  as  well  as  the  consistency  and  relevance  of 
the  retrieved  model  forcing.  Experiment  results  show  that  the  assimilation  system 
fits  the  data  with  relatively  high  prior  errors  in  the  initial  conditions  and  surface 
forcing  fluxes.  However,  the  retrieved  model  forcing  errors  are  well  within  the  range 
of  acceptable  corrections  according  to  an  independent  study. 


15.1  Introduction 

This  paper  presents  the  development  of  a  weak  constraint  4D  Var  data  assimilation 
system  based  on  the  representer  method  (Bennett  1992,  2002)  for  the  Navy  Coastal 
Ocean  Model  (NCOM).  NCOM  is  an  operational  ocean  model  that  has  been 
validated  (Marlin  2000;  Barron  et  al.  2006).  A  major  effort  to  implement  state-of- 
the-art  assimilation  schemes  was  undertaken  a  few  years  ago,  with  the  development 
of  a  3DVAR,  and  a  4D-Var  system  based  on  the  NCOM  numerical  code.  The 
3DVAR  system  is  used  for  assimilation  in  global  to  regional  scales,  while  the  4D  Var 
is  to  be  used  in  limited  area  models  with  in-situ  observations,  provided  initial  and 
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boundary  conditions  from  a  global  or  regional  model  assimilating  with  3DVAR 
Both  the  adjoint  and  linear  perturbation  (also  called  the  forward  representer)  model 
codes  were  derived  for  the  most  part  with  the  help  of  the  Parametric  Fortran 
compiler  (PFC),  Erwig  et  al.  2007. 

Some  general  circulation  models  of  the  complexity  of  NCOM  have  seen 
similar  efforts  undertaken  in  the  past  decade:  a  4D-Var  assimilation  system  was 
developed  for  the  Ocean  Parallelism  (OPA)  model  (Weaver  et  al.  2003),  for  the  MlT 
general  circulation  model  (MITgcm,  Marotzke  et  al.  1999)  also  used  in  the  ECCO 
consortium  assimilation  experiments  (Stammer  et  al.  2002),  and  a  similar  system 
was  built  for  the  regional  ocean  model  system  (ROMS),  Moore  et  al.  2004.  Unlike 
the  other  models  using  fixed  z-levels  (OPA  and  MITgcm)  or  s-coordinates  (ROMS) 
NCOM  uses  a  combination  of  both  sigma  layers,  z-levels  and  a  generalized  vertical 
coordinate. 

It  is  a  common  practice  to  test  a  recently  developed  assimilation  system  with 
climatological  data  or  identical  twin  experiments  in  which  the  observations  are 
simulated  by  the  numerical  model.  There  is  hardly  a  case  of  failure  in  twin 
experiments,  yet  a  successful  assimilation  with  twin  experiments  never  guarantees 
success  with  real  data.  On  the  other  hand,  climatological  data  are  overly  smooth  in 
both  space  and  time  (due  mostly  to  linear  interpolation)  and  lack  the  variability 
associated  with  real  observations.  To  avoid  these  simplified  cases,  the  newly 
developed  NCOM  4D-Var  system  is  tested  with  real  and  synthetic  observations 
generated  by  the  modular  ocean  data  assimilation  system  (MODAS)  Fox  et  al.  2002. 
as  well  as  with  real  observations  collected  from  satellites  and  a  fleet  of  gliders  during 
the  second  autonomous  ocean  sampling  network  (AOSN  II)  in  the  Monterey  Bay. 

There  are  no  specific  applications  of  4D-Var  in  the  Monterey  Bay,  let  alone 
its  weak  constraint  formulation.  Strong  constraint  variational  assimilation  (Broquet 
et  al.  2009)  has  been  applied  to  the  California  current  system  (CCS),  including  an 
application  to  estimate  surface  forcing  correction  (Broquet  et  al.  2011),  using  the 
inverse  Regional  Ocean  Modeling  System  (IROMS,  Di  Lorenzo  et  al.  2007)  with 
horizontal  resolutions  of  10  and  30  km.  The  CCS  is  a  large  area  that  includes  the 
Monterey  Bay,  although  these  applications  did  not  specifically  target  the  Monterey 
Bay,  given  their  rather  coarse  resolutions.  Most  of  the  assimilation  experiments  that 
have  been  carried  for  the  Monterey  Bay  were  based  on  sequential  methods  such  as 
3DVAR  and  ensemble-based  Kalman  filters:  Chao  et  al.  (2009),  Haley  et  al.  (2009). 
and  Shulman  et  al.  (2009).  This  study  presents  an  application  of  the  weak  constraint 
4D-Var  in  the  Monterey  Bay  in  a  proof-of-concept  context,  using  synthetic  and  real 
observations.  The  first  objective  is  to  demonstrate  the  system’s  ability  to  reduce  large 
discrepancies  between  the  model  and  the  observations,  when  the  latter  are  assigned 
very  low  errors.  Therefore,  this  paper  is  more  focused  on  the  technical  development 
of  the  weak  constraint  4D-Var  system. 

A  brief  description  of  the  numerical  model  is  presented  in  the  next  sec¬ 
tion,  followed  by  the  4D-Var  system  derivation  and  implementation  in  Sect.  15.3. 
Section  15.4  deals  with  the  experiments  setup  and  results,  and  concluding  remarks 
follow  in  Sect  15.5. 
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15.2  The  Model 

NCOM  is  described  in  the  literature  (Martin  2000;  Barron  et  ah  2006).  The 
description  of  the  model  equations  given  in  the  appendix  is  only  repeated  in  order 
to  exhibit  the  nonlinear  terms  in  the  model  equations,  as  they  directly  affect  the 
development  of  the  linearized  and  adjoint  models  associated  with  NCOM.  NCOM 
is  a  free  surface  model  based  on  the  primitive  equations  and  employs  the  hydrostatic, 
Boussinesq  and  incompressible  approximations.  The  model  is  discretized  using 
finite  differences  on  an  Arakawa  C-grid  in  the  spatial  dimensions.  The  equations  are 
solved  in  three  dimensions  for  momentum  (both  zonal  and  meridional  components 
of  velocity),  temperature  and  salinity,  and  two  dimensions  for  the  free-surface  mode: 
surface  elevation  and  barotropic  velocities. 

The  leapfrog  scheme  is  used  for  time  stepping  in  conjunction  with  an  Asselin 
filter  to  avoid  time  splitting.  All  terms  are  treated  explicitly  in  time  except  for 
the  solution  for  the  free  surface  and  vertical  diffusion.  In  the  solution  for  the 
free  surface,  the  surface  pressure  gradient  terms  in  the  depth-averaged  momentum 
equations  and  the  divergence  terms  in  the  depth-averaged  continuity  equation  are 
evenly  split  between  the  old  and  new  time  levels  to  minimize  the  damping  of  surface 
waves.  The  model  equations  discretized  with  finite  differences  in  flux-conservative 
form  are  given  in  the  appendix. 

The  model  domain  used  for  this  experiment  contains  the  Monterey  Bay,  California 
region.  This  location  is  favorable  for  ocean  modeling  due  to  its  strong  land/sea  breeze 
circulation  patterns,  complex  coastline  with  steep  topography,  and  the  existence  of 
frequent  local  upwelling  and  relaxation  events  (Shulman  et  al.  2002).  The  domain 
covers  latitudes  35.6°-37.49°  North  and  longitudes  121.38°-123.2°  West  with  a 
horizontal  resolution  of  2  km  and  4 1  layers  in  the  vertical.  The  model  was  initialized 
on  01  August,  2003  and  ran  for  one  month  to  01  September,  2003.  The  initial  condi¬ 
tions  were  obtained  from  downscaling  the  operational  1  /8°  resolution  global  NCOM 
to  an  intermediate  model  with  horizontal  resolution  of  6  km,  and  then  via  a  3-to-l 
nesting  ratio  to  the  2  km  model.  Horizontal  viscosities  and  diffusivities  are  computed 
using  either  the  gnd-cell  Reynolds  number  (Re)  or  the  Smagonnsky  schemes,  both 
of  which  tend  to  decrease  as  resolution  is  increased.  The  grid-cell  Re  scheme  sets 
the  mixing  coefficient  K  to  maintain  a  grid  cell  Re  number  below  a  specified  value, 
e.g.  if  Re  =  u*dx/K  =  30,  then  K  =  u*dx/30.  Hence,  as  dx  decreases,  K  decreases 
proportionally.  A  similar  computation  is  performed  for  the  Smagonnsky  scheme. 

Surface  boundary  conditions  (e.g.  wind  stress,  IR  radiation  flux,  etc.)  are 
provided  by  the  atmospheric  mesoscale  model  COAMPS  (Hodur  1997),  which  is 
run  at  the  same  horizontal  resolution  as  the  ocean  model,  with  forcings  archived 
every  12  h  at  the  synoptic  times  of  0000  and  1200  UTC.  Open  boundary  conditions 
use  a  combination  of  radiative  models  and  prescribed  values  provided  by  the 
1/8°  Global  NCOM  (GNCOM).  Different  radiative  options  are  used  at  the  open 
boundaries  depending  on  the  model  state  variables:  a  modified  Orlanski  radiative 
model  is  used  for  the  tracer  fields  (temperature  and  salinity),  an  advective  model  for 
the  zonal  velocity  (u),  a  zero  gradient  condition  for  the  meridional  velocity  (v)  as 
^ell  as  the  barotropic  velocities,  and  the  Flather  boundary  condition  for  elevation. 


370 


H.  Ngodock  and  M.  Carrier 


15.3  The  4D-Var  System 
75. 3. 1  Lin  earization 


Nonlinear  terms  in  the  model  consist  of  all  the  advection  terms  in  the  momentum  and 
tracer  equations,  the  horizontal  mixing  with  the  Smagorinsky  f  ormula,  the  curvature 
correction,  the  vertical  mixing  with  coefficients  computed  using  the  Mellor-Yamada 
2.5  turbulence  closure.  Additional  nonlinearities  stem  from  the  discretization  in  flux 
conservative  form  where  vertical  increments  A z  in  the  sigma  layers  depend  on  the 
free  surface  elevation.  As  a  consequence,  even  the  time  discretization  is  nonlinear. 
Nonlinearities  also  appear  in  the  free  surface,  or  barotropic  mode,  with  the  multiplica¬ 
tion  by  the  depth  variables  Duand  Dv  in  (15.23)  and  (15.24).  However,  the  barotropic 
transports  Duu  and  Dvv  are  computed  explicitly  first,  then  the  barotropic  velocities 
(5  and  v)  are  derived  by  dividing  the  barotropic  transports  by  the  depth  variable, 
which  is  a  nonlinear  operation.  The  baroclinic  pressure  gradient  is  computed  from  the 
density  field  obtained  from  the  state  equation  as  a  nonlinear  function  of  temperature 
and  salinity.  Other  nonlinearities  appear  in  the  various  radiative  conditions  at  the  open 
boundaries  of  the  model  domain  mentioned  above. 

With  the  exception  of  the  Mellor-Yamada  turbulence  closure,  all  of  these 
nonlinear  terms  are  linearized  according  to  the  first-order  Taylor’s  approximation 
for  the  derivation  of  the  tangent  linear  model. 

For  the  sake  of  clarity,  let’s  rewrite  the  leap  frog  time  discretization  of  (15.14), 
see  the  appendix,  in  the  form 


((Az“)"+I  «',+1  -  (Az“)"-1  m""1)  =  G\  (15.1) 

where  Gn  represents  the  terms  in  the  right  hand  side  of  (15.14)  evaluated  at  time 
level  /i,  and  the  depth  increment  (Az")^  is  available  from  a  previously  computed 
elevation.  The  numerical  model  is  updated  by 

wn+1  = - 5-xr  f(AzT~‘  un~'  +  T  2A'  (15.2) 

(Azu)  +  L  AxuAyu  J 

The  linearization  of  (15.2)  is 

1 


Sun+'  = 


(Az“)"_1  Sun~ 1  +  (SAzT"1  w""1  + 


(Az“)n+1 
(MzT+1 
[(Az“r+1]2 


2A  t 


AxuA  yu 


-8G 


’} 


(Az“)"-‘  w"-1  +  -  2A‘  G" 
AjcuA  yu 


(15.3) 


where  u  is  the  background  solution,  i.e.  the  solution  around  which  the  model  is 
linearized,  G  and  A z  are  computed  using  the  background  solution,  and  <5w,  SG 
and  SAz  are  the  linear  perturbations  of  u,  G  and  A z  respectively,  In  both  (15  2) 
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Fig.  15.1  Time  evolution  of 
the  magnitude  of  the 
perturbation  to  the 
temperature  and  salinity 
fields  normalized  by  the 
magnitude  of  their  respective 
initial  perturbations 


TLM  stability 


(and  hence  ( 15.3))  a  small  positive  number  is  usually  added  to  the  denominator  to 
prevent  it  from  vanishing.  As  mentioned  above  the  depth  increments  in  the  vertical 
discretization  in  NCOM  depend  on  the  time  varying  elevation  only  in  the  sigma 
layers  In  the  z-level  portion  of  the  vertical  grid,  (15.2)  and  (15.3)  take  the  form 


and 


»  =  ,/»-■  + 


2A/ 


AxuAyuAzu 


Gn 


(15.4) 


8un  +  l  =  8un~l  + 


2A/ 


Ax“AyuAzu 


-8G\ 


(15.5) 


As  for  the  vertical  mixing  coefficients  from  the  Mellor-Yamada  turbulence  closure 
scheme,  they  arc  provided  by  the  nonlinear  model  trajectory  around  which  the  model 
is  linearized. 

The  stability  of  the  linearized  model  is  assessed  by  the  time  evolution  of  small 
perturbations:  the  tangent  linear  model  is  initialized  by  random  three  dimensional 
perturbations  of  the  temperature  and  salinity  fields  and  integrated  over  time.  At  each 
time  step  the  norms  of  the  perturbed  temperature  and  salinity  fields  are  computed 
and  divided  by  the  norms  of  their  respective  initial  perturbations  Results  plotted  in 
fig  15,1  show  that  the  linear  perturbations  are  stable  and  bounded  for  about  12-15 
days  before  they  start  to  grow  exponentially.  Initial  perturbations  here  arc  generated 
by  the  adjoint  integration  forced  by  Dirac  impulses  at  randomly  selected  grid  points, 
ihis  process  produces  three-dimensional  initial  fields  with  dynamically  coherent 
Mructures  compared  to  purely  random  fields  However,  the  TLM  test  with  purely 
random  fields  did  not  yield  different  results  (not  shown). 
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15.3.2  Adjoint  Derivation 

Onee  the  linear  perturbation  model  was  obtained,  the  adjoint  model  was  derived  bv 
transposition  of  the  perturbation  model  as  follows  for  both  sigma  layers  and  z-lcvcll 

a*  =  a;+i 
A"+'  =0 


)  fl  +  1  _  1  />+ 1 


[(Az“)"+,]‘ 


(15.6) 


(A;")"-1 

(Aju)"+1 


and 


a*  =  a;+1 
a;+1  =o 


(15.7) 


where  A^  denotes  the  adjoint  variable  associated  with  the  state  variable  a  at  the 
time  level  /,  and  A*  is  a  temporary  variable.  In  (15.6)  and  (15.7)  it  is  assumed 
that  the  adjoint  variables  have  been  initialized  at  a  prior  time  level.  In  practice,  the 
model  is  usually  computer  programmed  by  subroutines,  with  individual  terms  of 
the  model  equations  computed  in  separate  subroutines.  Similarly,  the  linearization 
and  the  adjoint  derivation  were  carried  out  one  subroutine  at  a  time,  and  eare  was 
taken  to  ensure  that  symmetry  between  the  linearized  subroutine  and  its  adjoint 
was  preserved.  The  entire  linearized  model  was  obtained  once  every  subroutine  was 
linearized,  and  the  entire  adjoint  was  obtained  with  individual  adjoint  subroutines 
appearing  in  reverse  order  as  compared  to  the  linearized  model. 

In  practice,  both  the  linearized  and  adjoint  models  were  obtained  with  the  help  of 
the  Parametric  Fortran  compiler  (PFC).  Parametrie  Fortran  is  an  extension  of  Fortran 
that  supports  defining  Fortran  program  templates  by  allowing  the  parameterization 
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of  arbitrary  Fortran  constructs.  A  Fortran  program  template  can  be  translated  into 
a  regular  Fortran  program  guided  by  values  for  the  parameters.  The  Parametric 
Fortran  compiler  is  written  in  Haskell  (Peyton  Jones  2003),  and  the  parameter  values 
are  represented  as  Haskell  values  so  they  can  be  used  by  the  Parametric  Fortran 
compiler  directly.  Parametric  Fortran  is  particularly  useful  in  scientific  computing. 
The  applications  include  defining  generic  functions,  removing  duplicated  code,  and 
automatic  differentiation.  Parametric  Fortran  thus  has  broader  and  more  general 
uses  than  previous  tools  in  the  likes  of  TAMC  (Giering  and  Kaminski  1998), 
TAPENADE  (Hascoet  and  Pascual  2004)  or  ADIFOR  (Bischof  et  al.  1992), 
developed  just  for  the  purpose  of  automatic  differentiation.  The  differentiation  is 
based  on  the  chain  rule,  with  special  treatment  for  non-differentiable  functions. 


15.3.3  How  PFC  Works  for  TL  and  Adjoint  Generation 

The  Parametric  Fortran  compiler  is  publicly  available  from  http://web.engr. 
oregonstate.edu/~erwig/pf/.  It  is  a  command  line  program  in  which  the 
differentiation  operation  has  been  parameterized  by  “Diff”.  Assuming  it  has  been 
installed  on  a  user’s  computer,  it  can  be  used  to  generate  tangent  linear  and  adjoint 
of  Fortran  subroutines  or  programs  in  the  following  manner: 

1.  The  user  creates  a  parameter  text  file,  say  “param.file”,  in  the  format: 

Diff  =  TL  [varl ,  var2,  var3  . .  .J 

where  varl,  var2,  var3  ...,  form  a  list  of  all  active  variables  and  all 
variables  that  depend  or  operate  on  active  variables  (including  temporary 
variables),  “TL”  will  Indicate  to  the  compiler  that  the  tangent  linear  model 
is  being  created,  and  “Diff"  is  the  differentiation  parameter  for  Parametric 
Fortran. 

2.  For  a  subroutine  “test.f”  to  be  differentiated  the  user  also  creates  a  file 
“test.pf"  that  contains  the  subroutine  In  the  form 

{Diff: 

Subroutine  test(var1  ,var2. . . ) 

Body  of  subroutine 

end 

} 

3.  Finally,  the  compiler  is  Invoked  by  typing  the  following  from  the  command 
line:  pfc  -p  paramJile  test.pf  test.TL.f 

The  output  of  the  compiler  will  be  the  tangent  linearized  subroutine 
“test.TL.f”. 

4.  The  procedure  for  generating  the  adjoint  is  the  same  except  that  In  steps 
1  and  3  “TL”  is  replaced  with  “AD”. 
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For  generic  state  variables  x  and  y  and  a  subroutine  computing  a  quantity  Ax,  the 
symmetry  between  the  linearized  subroutine  and  its  adjoint  is  evaluated  by 

(Ax,y)  =  (x,  ATy)  (15.8) 

where  (. , .}  denotes  an  inner  product.  This  equality  should  hold  to  machine  precision 
(regardless  of  computer  architecture)  not  only  for  individual  subroutines,  but  also 
for  the  entire  linearized  model  and  its  adjoint.  For  randomly  generated  x  and  v 
as  initial  and  final  conditions  for  the  linearized  and  adjoint  models  respectively, 
equality  (15.8)  was  tested  for  integration  periods  of  1  and  5  days  with  an  absolute 
difference  in  the  order  of  10~14  between  the  left  and  right  hand  side  of  (15.8),  the 
computations  being  done  in  double  precision. 

Alternatively,  this  symmetry  is  also  assessed  by  the  symmetry  of  the  representer 
matrix  (Bennett  1992,  2002).  For  a  given  number  M  of  observation  locations  (in  the 
space-time  domain),  regardless  of  which  model  variable  is  observed,  representer 
functions  are  computed,  one  per  observation  location.  A  representer  function 
associated  with  a  given  observation  location  is  obtained  by  integrating  the  adjoint 
model  forced  by  a  Dirac  delta  function  centered  at  the  chosen  observation  location, 
then  using  the  adjoint  solution  (in  space  and  time)  as  forcing  for  the  perturbation 
model.  A  column  of  the  representer  matrix  is  computed  by  evaluating  a  representer 
function  at  all  observation  locations.  If  the  adjoint  model  is  consistently  derived 
from  the  perturbation  model,  the  representer  matrix  should  be  symmetric  to  the 
machine  precision  which  is  the  case  for  our  model  and  its  adjoint. 


15.3.4  The  Cost  Function 

For  sake  of  clarity,  the  model  equations  are  rewritten  in  a  simpler  form 

+  (IJ 

|xo  =0)  =  ;<*)  + id)  j  I 

where  X  stands  for  all  the  dependent  model  state  variables:  two  dimensional  sea 
surface  height  and  barotropic  velocities,  and  three  dimensional  temperature,  salinity 
and  baroclinic  velocities,  F  is  the  model  tendency  terms  in  the  right  hand  side  ol 
(15.14,  15.15,  15.16,  15.17  and  15.18)  and  (15.23,  15.24and  15.25),  /  is  the  model  | 
error,  a  function  of  the  independent  variables  ( x  and  t  )  of  the  space-time  domain  i* 
with  covariance  C/,  I(x)  is  the  prior  initial  condition,  i(x)  is  the  initial  condition 
error  with  covariance  Q.  Given  a  vector  Y  of  M  observations  of  the  model  state 
in  the  space-time  domain,  with  the  associated  vector  of  observation  errors  e  (^,iy 
covariance  C€), 


>'m  =  FtmX  +  ew,  1  <  m  <  M 


(15.10) 
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where  Hm  is  the  observation  operator  associated  with  the  mth  observation,  one  can 
define  a  weighted  cost  function 


T  T 


o  n  o  n 


(15.11) 


n  n 


where  Q  denotes  the  model  domain,  the  weights  Wf  and  Wt  are  defined  as  inverses 
of  Cf  and  C/  in  a  convolution  sense,  and  W€  is  the  matrix  inverse  of  C,.  The  latter 
is  usually  considered  a  diagonal  matrix,  from  the  assumption  that  observation  errors 
are  uncorrelated.  Boundary  condition  errors  are  omitted  from  (15.9)  to  (15.1 1)  only 
for  the  sake  of  clarity.  The  model  error  covariance  is  assumed  to  take  the  form 


where  v(jc)  is  the  error  variance  and  L  and  x  are  the  length  and  time  scales 
respectively.  The  initial  error  covariance  C/  assumes  the  form  of  (15.12)  with  the 
exception  of  the  time  correlation  term  and  different  (higher)  variance.  Horizontal 
correlations  in  (15.12)  are  obtained  by  solving  a  diffusion  equation  (Derber  and 
Rosati  1989;  Egbert  et  al.  1994;  Weaver  and  Courtier  2001),  while  the  time 
correlation  is  obtained  by  solving  a  pair  of  coupled  Langevin  equations  (Chua 
and  Bennett  2001;  Bennett  2002;  Ngodock  2005).  Correlations  in  (15.12)  are 
univariate  and  are  implemented  layer  by  layer  for  each  model  state  variable.  The 
cross  correlations  are  provided  by  the  model  dynamics  through  the  integration  of  the 
adjoint  and  the  tangent  linear  models.  Note  that  although  the  cost  function  is  written 
with  the  inverse  of  the  covariance  functions,  the  actual  inverses  are  not  needed  in 
practice,  when  the  solution  of  the  Euler  Lagrange  equations  associated  with  the 
minimization  of  (15.11)  is  sought  through  the  representer  method  (Bennett  1992, 
2002). 


15.3.5  Error  Standard  Deviations:  v{x)x!1 

Assigning  model  errors  and  prescribing  their  covariances  is  the  most  difficult  task 
in  data  assimilation,  as  acknowledged  by  most  assimilation  experts:  Daley  (1992), 
Talagrand  (1999),  Bennett  (2002),  Wunsch  (2006).  Not  only  are  there  many  error 
sources  (external  forcing,  initial  and  boundary  conditions,  bad  parameterization, 
empirical  formulation,  unresolved  processes),  but  also  the  errors  cannot  be  mea¬ 
sured.  Therefore  one  can  only  make  assumptions  about  them.  Since  NCOM  includes 
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all  resolvable  processes  and  sub-gridscale  parameterization,  errors  are  attributed  to 
the  initial  conditions  and  external  forcing  for  all  the  dynamical  equations,  and  the 
derivation  of  their  estimates  is  given  below.  Note  that  there  is  no  external  forcing 
applied  to  the  continuity  equation,  and  thus  it  is  not  assigned  a  model  error  either 
as  in  Jacobs  and  Ngodock  (2003). 

Consider  the  momentum  equation  (15.14)  in  its  non-discretized  form 


du 


+  p~'F 


(15.13) 


where  F  represents  the  wind  stress  atmospheric  forcing  (in  Nm"2),  the  volume 
flux  source  and  the  tidal  potential,  and  p  is  the  water  density.  The  model  error 
at  the  surface  consists  of  errors  in  the  wind  stress.  For  the  subsurface,  errors  are 
assumed  to  arise  from  the  volume  flux  and  the  tidal  potential  terms.  We  consider 
errors  to  be  high  in  magnitude  at  the  surface  and  decreasing  with  depth.  Although 
the  wind  stress  varies  in  space  and  time,  its  associated  error  is  assumed  uniform 
in  the  horizontal  directions.  The  error  magnitude  is  considered  to  be  50  %  of  the 
actual  wind  stress  at  the  surface  and  decreasing  with  depth  in  order  to  mimic  the 
decreasing  impact  of  wind  stress  with  depth.  Two  terms  contribute  to  the  forcine 
for  the  temperature  equation:  the  net  longwave,  latent  and  sensible  heat  flux  on  one 
hand,  and  the  solar  radiation  on  the  other  hand.  Both  are  assumed  to  be  30  %  in  error 
and  the  sum  of  their  errors  constitutes  the  forcing  error  in  the  temperature  equation, 
with  a  spatial  distribution  similar  to  the  one  used  for  the  errors  in  the  momentum 
equation.  A  similar  approach  is  taken  for  the  errors  in  the  salinity  equation,  where 
the  forcing  consists  of  the  river  inflow  and  evaporation  minus  precipitation.  Forcing 
terms  here  are  also  considered  to  be  30  %  in  error.  Finally  the  standard  deviations 
for  the  initial  condition  errors  are  1  m  for  the  surface  elevation,  O.Sms-1  for  both 
components  of  the  velocity  field,  2  K  for  temperature  and  0.5FSU  for  salinity.  These 
rather  high  errors  indicate  the  lack  of  confidence  in  the  forcing  fields  and  initial 
conditions.  Spatial  and  temporal  correlation  scales  in  (15.12)  are  set  to  10  km  and 
30  h.  The  errors  and  scales  above  are  obviously  arguable,  and  it  is  not  our  intention 
to  defend  their  choice  Rather,  they  are  selected  in  this  preliminary  assimilation 
setup  to  demonstrate  the  functionality  of  the  NCOM  4D-Var  system.  Smaller  errors 
will  be  adopted  when  the  system  is  used  with  real  observations. 


15.3.6  The  Minimization 

The  solution  of  the  assimilation  problem  is  found  by  solving  the  Euler-Lagrange 
(EL)  system  of  equations  associated  with  the  minimization  of  the  cost  function 
(15.11).  The  EL  system  is  a  linear  yet  coupled  system  between  the  adjoint  and 
state  variables.  The  representer  methods  uncouples  the  system  by  expanding  the 
solution  as  the  sum  of  a  first  guess  and  a  finite  linear  combination  of  representor 
functions,  with  the  representer  coefficients  computed  by  solving  a  linear  system  in 
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data  spaee  involving  the  representer  matrix,  the  data  error  eovarianee  matrix  and 
the  innovation  veetor.  The  entire  representer  matrix  need  not  be  computed  sinec  the 
linear  system  ean  be  solved  using  an  iterative  algorithm  (e.g.  the  conjugate  gradient), 
by  taking  advantage  of  the  symmetry  of  each  matrix  involved.  The  representer 
coefficients  constitute  the  right  hand  side  of  the  adjoint  equation  in  the  EL  system. 
Once  the  representor  coefficients  are  computed,  they  arc  substituted  in  the  adjoint 
equation  whieh  is  then  solved  and  substituted  in  the  forward  linear  equation  for 
the  final  solution.  A  background  solution  around  whieh  the  model  is  Unearned  is 
needed.  Usually  it  is  the  solution  of  the  nonlinear  model.  For  the  first  guess  solution, 
one  may  consider  cither  the  background  or  the  tangent  linear  solution  around 
the  background.  Also,  the  new  optimal  solution  may  replaee  the  background  for 
another  minimization  process  (i.e.  outer  loops)  until  formal  convergence  (Bennett 
et  al.  1996,  1998, 2002;  Ngodock  et  al.  2000,  2007,  2009). 


15.4  Experiment  Setup  and  Results 

Assimilation  experiments  arc  earned  out  with  two  different  data  sets,  and  the  results 
shown  below  are  primarily  aimed  at  evaluating  the  4D  Var  system’s  ability  to  fit 
both  the  assimilated  and  the  non-assimilated  observations. 


15.4.1  MODAS  Data 

MODAS  generates  synthetie  vertical  profiles  of  temperature  and  salinity  in  the  two 
following  steps:  first,  a  subsurfaee  temperature  is  computed  at  a  given  depth  using  a 
regression  from  sea  surfaee  temperature  and  the  sterie  component  of  the  sea  surfaee 
height  anomaly.  Onee  the  subsurface  temperature  is  computed,  a  corresponding 
subsurfaee  salinity  is  computed  using  a  elimatology-based  temperature/salinity 
relationship,  Fox  et  al.  (2002).  MODAS  data  are  thus  a  combination  of  real  sea 
surfaee  data  (SSH  and  SST)  and  simulated  sub-surfaee  data  derived  from  the  real 
surface  data  using  regression  and  historical  relationships. 

MODAS  synthetics  are  saved  and  utilized  in  the  4D~Var  analysis  at  intervals  of 
6h  There  are  approximately  fifty-six  uniformly  distributed  profiles  of  temperature 
and  salinity  aeross  the  model  domain.  Each  profile  is  represented  on  a  vertical  grid 
of  46  layers  that  do  not  eoineide  with  the  model’s  vertical  grid  of  41  layers,  but  the 
observation  operator  11  in  (15.10)  handles  the  projection  from  the  model  grid  to  the 
data  grid.  Temperature  (salinity)  observation  errors  arc  set  to  0.2°C  (0.1  psu),  and 
held  constant  through  the  entire  assimilation  window.  These  observation  errors  are 
purposefully  set  low,  not  beeause  MODAS  data  arc  very  aeeurate,  but  to  test  the 
assimilation’s  ability  to  reduee  large  discrepancies  with  the  model,  i.e.  to  drive  the 
model  with  large  errors  to  fit  observations  with  small  errors. 
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Fig.  t5J  The  model  domain 
wilh  baihymciry  coniours  and 
lhc  profile  locations, 
including  lhc  numbered 
profiles  (in  red)  where  lhe 
assimilated  solution  is 
evaluated 
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15.4.2  Results  with  MODAS  Data 


Starting  from  an  initial  condition  on  August  02,  the  model  was  integrated  and 
the  assimilation  performed  for  5  days  at  a  time,  with  the  analysis  at  the  end  of 
the  5  days  becoming  the  initial  condition  for  the  following  5-day  assimilation  the 
overall  assimilation  experiment  interval  being  30  days. 

In  order  to  assess  how  well  the  assimilation  fits  the  observations,  the  analy^l 
is  examined  at  5  locations  in  the  model  domain  shown  on  Ftg.  15,2. 
locations  are  selected  according  to  their  geographic  position  with  respect  to  the  b 
offshore  (location  I ),  slightly  outside  of  the  bay  mouth  (location  2),  inside  the  bay  j 
(location  3),  and  south  and  north  of  the  bay  (locations  4  and  5).  Results  at  la 
2  and  4  are  similar  to  those  at  location  5,  and  therefore  are  not  shown. 

Examining  the  solution  in  the  top  500  m  at  the  offshore  location  1 ,  it  can  be  s 
that  the  assimilation  is  able  to  correct  large  and  small  discrepancies  between  the 
first  guess  and  the  observations  for  both  the  temperature  and  salinity  fields,  as  seen 
in  Fig.  15.3.  In  the  first  5  days  temperature  discrepancies  range  between  2 K  in  the 
upper  50  m,  and  about  1  K  from  100  m  and  below.  Likewise  salinity  discr 
range  from  0.15psu  in  the  upper  200  m  to  0.05psu  below.  These  discrepancies  ait 
gradually  corrected  in  the  analysis  (bottom  panels  of  Fig.  15.3)  and  by  the  cod 
of  the  first  5-day  assimilation  window,  they  have  vanished.  For  the  sub* 
5-day  assimilation  windows,  the  model  temperature  and  salinity  appear  to  be  util 
constrained  below  1 00  m  with  minimal  to  no  discrepancies  between  the  first  guess 
and  the  data.  Discrepancies  arc  confined  to  the  upper  100m,  They  arc  small  at  the 
beginning  of  each  5-day  window  and  grow  with  time.  This  is  to  be  expected  $ 
the  first  guess  is  initialized  with  the  previous  5-day  analysis  at  the  final  time,  and 
because  the  NOGAPS  forcing  fields  are  not  necessarily  compatible  with  M0C 
data.  That  the  discrepancies  are  confined  to  the  upper  ocean  also  suggests  that  the 
model  error  is  driven  by  erroneous  surface  fluxes,  although  the  simulation  of  the 
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Fig.  I5J  Time  evolution  of  the  absolute  value  of  the  innovation  {lop)  and  the  analysis  error 
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Fig.  15-4  Same  as  Fig.  1 5.3,  except  for  location  3 

,  mixed  layer  could  also  be  incompatible  with  MODAS  data.  Yet  both  the  data  and 
the  forcing  fields  are  purposefully  chosen  in  order  to  test  the  assimilation’s  ability 
to  efficiently  reduce  these  discrepancies  while  estimating  a  reasonable  (magnitude- 
wise)  correction  to  the  surfacp  fluxes.  The  assimilation  effectively  reduces  all  the 
discrepancies  to  within  the  da|a  standard  deviation  for  both  temperature  and  salinity. 

The  maximum  depth  at  loqation  3  inside  the  bay  is  28  m.  Results  at  this  location, 
shown  in  Fig.  15,4,  indicate  (hat  high  salinity  discrepancies  sometimes  exceeding 
.25psu  arc  distributed  through  the  water  column  during  the  first  5-day  assimilation 
period.  Some  large  salinity  discrepancies  also  appear  between  days  18-20.  Temper¬ 
ature  discrepancies  on  the  other  hand  are  more  prevalent,  distributed  over  space  and 
time.  It  appears  that  the  initialization  of  the  model  using  the  previous  5-day  analysis 
has  less  influence  on  the  current  5-day  first-guess.  This  may  be  due  to  the  fact  that  in 
this  shallow  location,  temporal  variability  of  the  solution  is  mostly  governed  by  the 
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local  extcrnal/surface  forcing  coupled  with  strong  mixing,  and  not  by  the  short-lr 
initial  conditions.  Nevertheless,  the  assimilation  still  significantly  reduces  A 
discrepancies  (boltom  panels  of  Fig.  15.4)  through  the  depth-time  domain  i 
for  some  isolated  places.  Assimilation  results  at  location  4  (south  of  the  i 
the  bay)  are  very  similar  to  those  at  location  2,  and  therefore  are  not  shown  I 
At  location  5  (north  of  the  mouth  of  the  bay)  the  maximum  depth  is  100m.  The  } 
largest  salinity  discrepancies  are  in  the  upper  20  m  during  the  first  5-day,  J 
in  Fig.  1 5.5.  There  are  also  some  moderate  discrepancies  in  the  lower  layers  3 
day  24.  Temperature  discrepancies  are  initially  moderate  (less  than  UK  du 
the  first  5-day  period)  and  remain  low  until  day  20,  after  which  they  start  j 
again,  reaching  2  K.  For  most  of  the  assimilation  period  these  discrepancy  vt 
significantly  reduced  below  0.5  K,  except  for  some  isolated  locations,  e.g.  i 
40  m  depth  at  days  2 1  and  22. 


15.4.3  AOSN II  Data 


The  dataset  comprises  SST  from  satellite  and  aircraft,  a  few  SSH  from  safdlai 
altimetry  (due  to  the  limited  area  of  the  model  domain),  vertical  profiles  of 
temperature  and  salinity  from  Slocum  and  Spray  gliders  and  two  moorings  (Ml  and 
M2)  and  AXBTs.  All  the  vertical  profiles  are  projected  on  a  static  grid  of  42  levdi  j 
Slocum  glider  tracks  covered  a  portion  of  the  bay,  the  mouth  of  the  bay  and 
the  area  to  the  northwest  of  the  bay,  i.e.  the  upwclling  center  around  Afio  Nu 
Spray  glider  tracks  originated  from  the  nearshore  and  went  offshore  in  transec-lika 
trajectories  as  seen  in  Fig.  1 5.6.  lb  avoid  redundancy  some  of  the  glider  data  m 
withheld  from  the  assimilation  and  used  for  validation  of  the  analyses.  Withholdug 
the  data  takes  into  account  the  model  grid  resolution  and  the  prescribed  horizoatd 
decorrelation  scale  of  the  model  error.  The  observations  are  assigned  a  constant  error 
of  0.5  K  and  0.3psu  in  temperature  and  salinity  respectively. 
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Fig.  15.6  All  glider  and  ihe  iwo  mooring  positions  (left),  and  assimilated  Slocum  (center)  and 
Spray  (right)  glider  tracks  during  August,  2003.  The  red  dots  represent  the  location  of  the  moored 
buoys  M I  (right)  and  M2  (left).  The  red  hex  iodic  ales  the  upwelling  center  near  Afio  Nuevo 


ISAA  Results  with  AOSN II  Data 

The  assimilation  covers  the  time  window  of  August  2  to  August  27,  2003,  and  is 
carried  out  in  cycles  of  5  days,  with  the  analysis  at  the  end  of  a  cycle  becoming  the 
initial  condition  for  the  following  cycle.  Although  the  observations  are  processed 
and  stored  in  6-h  intervals,  the  4D-Var  system  assimilates  all  observations  within 
the  5-day  cycle  simultaneously.  The  performance  of  the  assimilation  system  is 
examined  by  computing  the  difference  between  the  observations  and  three  model 
solutions;  (I)  the  free  running  (non  assimilative)  model  that  is  integrated  from  the 
given  initial  conditions  and  forcing  fields,  (2)  the  first  guess  (also  non  assimilative) 
for  which  the  initial  condition  is  updated  from  the  assimilation  in  the  previous 
eyclc,  with  the  exception  of  the  first  cycle  where  both  the  first  guess  and  the  free 
running  model  arc  equal,  and  (3)  the  analysis.  The  first  guess  is  also  the  background 
trajectory  for  the  tangent  linear  model  and  the  adjoint,  i.c.  the  trajectory  around 
which  the  model  is  linearized.  It  is  stored  in  intervals  of  6h.  It  is  anticipated  that 
due  to  the  re- initialization  from  assimilating  in  a  previous  cycle,  the  first  guess 
should  have  smaller  discrepancies  with  the  observations  than  the  free  running 
model,  and  the  analysis  should  have  smaller  discrepancies  with  the  observations 
than  the  first  guess.  This  should  be  the  case  for  discrepancies  computed  with  the 
assimilated  and  non-assimilated  observations.  It  is  expected  of  every  assimilation 
system  to  fit  the  assimilated  observations  within  one  observation  standard  deviation. 
Unassimilated  observations  consist  of  withheld  observations  within  the  current 
assimilation  window  and  future  observations,  those  in  the  next  cycle  before  the 
assimilation.  The  assimilation  is  expected  to  fit  the  former  as  a  measure  of  the 
system’s  ability  to  propagate  the  information  from  the  assimilated  observations  sites 
through  the  model  spate-time  domain  within  the  assimilation  window.  However, 
there  is  no  expectation  to  fit  future  observations,  i.c.  the  innovations  in  the  next 
cycle  are  not  expected  to  be  smaller  than  the  observation  standard  deviation.  One 
only  hopes  that  having  initialized  the  model  from  the  previous  cycle’s  assimilation, 
the  model  forecast  will  remain  sufficiently  accurate  to  maintain  small  innovations. 
However,  integrating  the  model  from  the  initial  conditions  with  uncorrected  forcing 
fields  is  prone  to  drive  the  model  away  from  the  observations. 
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Fig.  15.7  Absolute  model  temperature  (left)  and  salinity  (ngkt)  discrepancies  to  ( 
observations  for  the  free  run  (fop),  first  guess  (middle)  and  analysis  (bottom) 


m 


The  difference  between  the  observations  and  the  model  is  computed  for  i 
assimilated  profiles  of  temperature  and  salinity  and  plotted  in  chronc 
in  Fig.  1 5.7.  It  can  be  seen  that  the  temperature  differences  are  confined  in  the 
100m  of  the  water  column,  with  magnitudes  sometimes  reaching  3  K  for  bot| 
free  mn  and  the  first  guess.  Salinity  differences  extend  deeper  in  the  water  c 
to  about  200  m,  although  the  largest  differences  are  confined  to  the  upper  IOOslA 
slight  improvement  can  be  noticed  from  the  free  run  to  the  forecast  solutions  in  d 
temperature  field,  but  not  as  much  in  the  salinity  field.  However,  the  assimila 
able  to  significantly  reduce  the  forecast  discrepancies  in  both  the  temperaiungu 
salinity  fields,  with  the  exception  of  a  few  profiles  at  the  beginning  of  each  < 

The  assimilation  is  able  to  reduce  discrepancies  as  high  as  3  K  and  0.4psu  to  leaf  ] 
than  0.5  K  and  0.1  psu  in  temperature  and  salinity  respectively. 

The  forecast  solution  is  expected  to  have  smaller  discrepancies  to  the  < 
tions  than  the  free  run,  because  it  is  initialized  with  the  analysis  at  the  end  of  jl 
previous  cycle.  So,  having  only  a  marginal  improvement  from  the  free  run  to  Ac 
first  guess  is  an  indication  that  the  gains  from  the  assimilation  are  short-lived  in  ike 
forecast  run  as  a  consequence  of  inadequate  forcing  fields  driving  the  model  m|| 
from  future  observations 


15.4.5  Independent  Observations 


For  verification  and  evaluation  purposes,  discrepancies  are  computed  betweeatke 
withheld  glider  observations  and  the  three  model  solutions:  the  free  run,  the  fan 
guess  and  the  analysis.  Results  in  Fig.  15.8  show  that  all  three  solutions  have  sim2 
error  levels  with  respect  to  the  un-assimilated  as  to  the  assimilated  observation 
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Fig.  15.8  Same  as  Fig.  15.7,  except  for  non-assimilated  glider  observations 


This  result  was  expected  because  in  most  cases  the  withheld  observations  were 
located  in  the  vicinity  of  assimilated  observations.  There  are  still  some  large 
temperature  and  salinity  discrepancies  in  the  analysis,  usually  around  the  beginning 
of  the  assimilation  cycle. 


15.4.6  Qualitative  Fitting  of  the  Data 

The  assimilation  system’s  ability  to  fit  the  observations  is  further  examined  by 
comparing  the  differences  between  the  observations  and  the  free  running  model,  the 
first  guess  and  the  analysis  for  all  the  observations  and  at  all  times,  for  both  MODAS 
and  AOSN  II  data.  The  free  running  model  is  integrated  from  the  initial  conditions 
and  is  never  re-initialized,  while  the  first  guess  for  an  assimilation  cycle  is  initialized 
by  the  analysis  at  (he  end  of  the  previous  cycle.  Elements  of  these  difference  vectors 
are  binned  by  comparing  their  magnitude  to  the  observations  standard  deviation. 
For  example,  all  elements  that  are  smaller  than  a  standard  deviation  in  absolute 
value  arc  binned  together,  and  so  arc  all  elements  whose  absolute  value  is  between 
one  and  two  standard  deviations,  and  so  on.  The  number  of  elements  in  each  bin  is 
then  converted  into  a  percentage  of  the  number  of  assimilated  observations.  The 
results  plotted  as  a  cumulative  bar  chart  on  Fig.  15.9  show  that  the  assimilated 
solution  with  MODAS  data  fits  80  %  and  90  %  of  the  observations  to  within  one 
and  two  standard  deviations  respectively,  while  the  corresponding  numbers  for  the 
first  guess  are  60  %  and  75  %,  and  45  %  and  63  %  for  the  free  running  model.  Some 
posterior  misfits,  although  only  a  small  percentage,  are  larger  than  7  observations 
standard  deviations,  which  obviously  violate  the  Gaussian  assumption  on  the  errors 
in  general.  Similarly,  for  the  AOSN  II  data,  assimilated  solution  fits  86  %  and  95 
%  of  the  observations  to  within  one  and  two  standard  deviations  respectively,  while 
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Fig.  15.9  Cumulative  bar  chart  showing  ihe  percentage  of  the  number  of  observations  that ; 
matched  by  ihe  free  running  model  (black),  Ihe  first  guess  {grey )  and  lhc  analysis  {white)  as  a 
function  of  the  number  of  observation  standard  deviations.  MODAS  experiment  is  shown  on  the 
left  and  AOSN  D  experiment  on  the  right 


the  corresponding  numbers  for  the  first  guess  are  68  %  and  80  %,  and  64  %  and  76 
%  for  the  free  running  model. 

The  large  posterior  misfits  happened  for  some  temperature  observations  with 
prior  misfits  sometimes  higher  than  5°,  and  the  assimilation  reduces  these  misfits 
to  about  1.5°.  They  are  larger  than  7  standard  deviations  primarily  because  of  very 
low  data  errors  and  possibly  high  model  errors.  It  is  assumed  that  a  better  fit  would 
be  achieved  with  larger  observation  errors  and  lower  model  errors.  Such  experiments 
(not  shown  here)  are  carried  in  the  context  of  real  observations  and  are  the  subject 
of  another  study. 


15.5  Conclusion 


A  4D- Var  assimilation  system  for  NCOM  has  been  developed  based  on  the  indirect 
representer  method.  The  system  produces  analysis  increments  for  all  prognostic 
variables  (3D  temperature,  salinity,  u  and  v-  components  of  velocity,  and  sea 
surface  elevation)  from  a  time-window  of  observations  in  a  weak-eonstraint  env 
ronment.  The  adjoint  model  has  been  cheeked  against  the  linearized  model  using 
well  established  methods,  verifying  that  the  system  is  symmetric  to  within  machine 
precision.  Assimilation  experiments  were  carried  out  with  two  different  data  sets. 

The  first  experiment  involved  MODAS  synthetic  data  (7,  S,SSH)  that  were  sam¬ 
pled  every  6  h  and  assimilated  in  a  sequence  of  5-day  time  windows.  Starting  from 
an  initial  condition  on  August  02,  the  model  was  integrated  and  the  assimilation 
performed  for  5  days  at  a  time,  with  the  analysis  at  the  end  of  the  5  days  becoming 
the  initial  condition  for  the  following  5-day  assimilation.  The  results  indicate  that  th 
assimilation  system  is  performing  correctly,  with  the  model-data  misfit  is  reduced 
substantially  as  examined  at  individual  profiles. 
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The  second  experiment  used  the  observations  collected  during  AOSN  II.  Starting 
from  a  free  run  solution  that  completely  misrepresented  both  the  data  and  the 
dynamics  of  the  region  during  the  selected  time  period,  the  assimilation  was  able 
to  accurately  fit  the  assimilated  data.  Also,  contrary  to  the  free  run  and  the  first 
guess,  the  upwelling  and  relaxation  events  that  dominate  the  dynamics  of  the  regions 
were  accurately  described  by  the  analysis  which  benefited  from  a  good  observation 
coverage  of  the  domain  and  a  robust  assimilation  system. 

To  avoid  redundancy,  some  glider  profiles  were  withheld  from  the  assimilation 
and  used  for  evaluation.  The  analysis  fitted  the  withheld  observations  with  the  same 
accuracy  as  the  assimilated  observations.  This  was  due  in  part  to  the  proximity  of 
the  withheld  observations  with  those  that  were  assimilated. 

The  assimilated  solution  with  MODAS  data  fits  80  %  and  90  %  of  the 
observations  to  within  one  and  two  standard  deviations  respectively,  while  the 
corresponding  numbers  for  the  first  guess  are  60  %  and  75  %,  and  45  %  and  63  %  for 
the  free  running  model.  Some  posterior  misfits,  although  only  a  small  percentage, 
are  larger  than  7  observations  standard  deviations,  which  obviously  violate  the 
Gaussian  assumption  on  the  errors  in  general.  Similarly,  for  the  AOSN  II  data,  the 
assimilated  solution  fits  86  %  and  95  %  of  the  observations  to  within  one  and  two 
standard  deviations  respectively,  while  the  corresponding  numbers  for  the  first  guess 
are  68  %  and  80  %,  and  64  %  and  76  %  for  the  free  running  model. 

The  largest  discrepancies  between  the  first  guess  and  the  observations  were 
mostly  confined  to  the  upper  ocean.  After  the  first  5*  day  assimilation  the  first  guess 
discrepancies  grew  quickly  from  their  small  initial  values,  confirming  that  the  model 
is  being  forced  by  surface  fluxes  that  are  not  compatible  with  the  observations. 
This  was  purposefully  set  up  in  order  to  test  the  assimilation’s  ability  to  efficiently 
reduce  these  discrepancies  while  estimating  what  appears  to  be  magnitude-wise  a 
reasonable  correction  to  the  surface  fluxes. 
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Appendix 

The  discretization  of  NCOM  uses  second-order  interpolation  and  differentiation  as 
defined  with  the  notations: 


0  =  0.5  (0X  +  AX/2  +  , 


d<p 

dx 


=  ^  {<P*+*x/2  “  <Px-\x/l)  , 
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s2t(p  =  (<f>t  +  bi  -  ^r— Ac) 

The  NCOM  equations  are  then  discretized  in  flux  conservative  form  as  follows 


AxuA  yk 

ikf 


S2t  (A zuu)  =  AxA>  Az  (/  +  Ccurv)  v>  -  AyuAzug8x(C  +  U.  -  Ktp) 


-  AxuA zu—8x(pt) 
Po 


—  8X  ^AyliAzuuaXux^  -  8y  ( AxvAzvvaXuy^ 

-  St  (AxAvVir) 
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+  Ajc“  A>’"6;  - - 


(*0 


Aat^A^' 

2Ar 


<5zr  (Az'V)  =  A*A.yAz(/  +  Ccury)ux  -  AxvA zvgSy  (<*  +  ?afm  -</f) 
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Km 


(Sr') 
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AxA_y 

~2a7 

AxAy 


<$2r  (Az)  =  —8X  ( AyuAzuua )  —  8y  (Ax^Az^v0)  —  ^(AxAyw)  (15.16) 
2^t  S2l  (AzT)  =  -8X  (a  yAzVT*)  -  8y  (a*vA  zVf) 
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+  AxA ySt  +  AxAyQrSty  (15. 


*tY  (15.17) 


^~-S2t(AzS)  =  -6x(AyuAz‘4uaS*)  -  Sy(AxvAzv^Sy)  -  <$*(AxAywS*) 


In  (15. 14,  15.15, 15.16, 15.17  and  15.18),  Fu  and  Fv  are  the  horizontal  mixing  terms, 
$atm  and  $tp  are  the  atmospheric  surface  pressure  and  tidal  potential  respectively, 
and  f *  is  the  surface  elevation  term  that  can  be  distributed  among  any  of  the  three 


time  levels,  =  arf"*1  -f  -f  ay$n  \  according  to  the  temporal  weighting 


terms  a\,  a2,  or  c*3,  which  are  specified  by  the  user.  Am  and  Ah  are  the  horizontal 
mixing  coefficients  for  the  velocity  and  scalar  fields  (temperature  and  salinity) 
respectively,  likewise  Km  and  Km  for  the  vertical  mixing,  Q  is  a  volume  flux 
source  term  (with  Tior,  5son  u SOr,  and  vSOf  as  the  term  source  values),  Qr  is  the 
solar  radiation,  y  is  a  function  describing  the  solar  extinction.  Ax ,  Ay  and  A z 
denote  the  grid-cell  dimensions  defined  at  the  center  of  the  grid  cells,  and  the 
superscripts  m,  v  and  w  indicate  the  grid-cell  dimensions  computed  at  those  velocity 
locations  on  the  staggered  Arakawa  C-grid.  /  is  the  Coriolis  term,  po  and  p{  are 
the  reference  density  of  seawater  and  the  internal  pressure,  respectively,  and  the 
horizontal  advection  velocity  terms  are  given  by  m*  and  v*.  The  term  Ccurv  is  used 
to  correct  the  horizontal  advection  of  momentum  for  the  horizontal  curvature  of  the 
grid.  It  is  calculated  as 


The  horizontal  mixing  terms  for  the  momentum  equations  are  given  by 


A*VA  zvA\( 
Axv 


(15.20) 
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where  the  mixing  coefficient  is  modeled  according  the  Smagorinsky  formula 


Am  =  Cs  mag  Ax  Ay 


1  „  _ 


2  Ay 


<$2vU"  + 
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1 


2Ax 


1  SlrV"^ 


(15.22) 


with  the  magnitude  of  the  eddy  coefficient  being  scaled  by  the  constant  Csmag.  The 
vertical  mixing  coefficients  are  computed  using  the  turbulence  closure  by  Mellor 
and  Yamada  in  either  2  or  2.5  version. 

The  computation  for  the  free-surface  mode  is  governed  by  the  equations: 


AyuDugSx  (a,r+1  +  <*2?  +  a^1)  +  DuGl  (15.23) 

A xvDvg8y  (a,r+l  +  +  a^n~l)  +  DvGl  (15.24) 

8X  ^Ay“  (D,'«)',+1  4-  ( D\ )n  +  fiy  I 

■8y  (a^  (F'v)n+1  +  02  (F'v)”  +  02  | 

4-  AxAyDQ ,  (15.25)  j 

where  fa  and  ^3  are  positive  constants  define  by  the  user  with  f}\  4-/?2  4- /?3  =  1. 
DUGU  and  DVGV  are  the  vertical  integrals  of  all  the  terms  in  the  right  hand  side 
of  (15.14)  and  (15,15)  respectively,  with  the  exception  of  the  surface  elevation 
gradient  terms  and  the  vertical  mixing,  and  Du  =  Dx  and  Dv=  Dy.  The  free- 
surface  mode  (15.25)  is  solved  by  first  substituting  (Du«)rt+1  and  (Z)vv)n+i  from 
the  time  discreti/cd  ( 1 5.23)  and  (15.24)  into  (1 5.25),  resulting  in  an  elliptic  equation 
that  is  solved  for  the  surface  elevation  at  time  level  n  4-  1 ,  which  is  then  substituted 
back  in  (15.23)  and  (15.24)  for  computing  the  barotropic  transports  Duu  and  D  v 
from  which  the  barotropic  velocities  are  obtained. 

The  vertical  discretization  uses  a  combination  of  sigma  layers  and  z-levels  in  a 
three-tiered  distribution  with  (1)  free  sigma  layers  near  the  surface  that  expand  and 
contract  with  the  free  surface  elevation,  (2)  fixed  sigma  layers  that  do  not  vary  with 
the  free  surface,  and  (3)  fixed  z  levels  that  allow  for  partial  bottom  cells  for  a  better 
match  of  the  bottom  topography. 
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